As in the presentation, we will use data from the Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany for this exercise. You should (have) download(ed) the dataset in .csv format in a folder caller data within the folder containing the materials for this workshop. Also remember that it is helpful to consult the codebook for the dataset.

That being sad, let’s get wrangling…

…but before we can do that, we need to load the tidyverse package(s) and import the data.

library(tidyverse)

gesis_panel_corona <- read_csv2("../data/ZA5667_v1-1-0.csv")

1

Let’s first check how many rows and columns the dataset has.
You can use separate functions for printing the number of rows and the number of columns or one function that prints both.
nrow(gesis_panel_corona)
## [1] 3765
ncol(gesis_panel_corona)
## [1] 137
dim(gesis_panel_corona)
## [1] 3765  137

2

Next, let’s catch a glimpse (hinthint) of the dataset.
glimpse(gesis_panel_corona)

3

Now check out the names of the columns/variables in the dataset.
names(gesis_panel_corona)
##   [1] "za_number"             "version"               "doi"                  
##   [4] "id"                    "cohort"                "sex"                  
##   [7] "age_cat"               "education_cat"         "intention_to_vote"    
##  [10] "choice_of_party"       "political_orientation" "marstat"              
##  [13] "household"             "hzcy001a"              "hzcy002a"             
##  [16] "hzcy003a"              "hzcy004a"              "hzcy005a"             
##  [19] "hzcy006a"              "hzcy007a"              "hzcy008a"             
##  [22] "hzcy009a"              "hzcy010a"              "hzcy011a"             
##  [25] "hzcy012a"              "hzcy013a"              "hzcy014a"             
##  [28] "hzcy015a"              "hzcy016a"              "hzcy018a"             
##  [31] "hzcy019a"              "hzcy020a"              "hzcy021a"             
##  [34] "hzcy022a"              "hzcy023a"              "hzcy024a"             
##  [37] "hzcy025a"              "hzcy026a"              "hzcy027a"             
##  [40] "hzcy028a"              "hzcy029a"              "hzcy030a"             
##  [43] "hzcy031a"              "hzcy032a"              "hzcy033a"             
##  [46] "hzcy034a"              "hzcy035a"              "hzcy036a"             
##  [49] "hzcy037a"              "hzcy038a"              "hzcy039a"             
##  [52] "hzcy040a"              "hzcy041a"              "hzcy042a"             
##  [55] "hzcy043a"              "hzcy044a"              "hzcy045a"             
##  [58] "hzcy046a"              "hzcy047a"              "hzcy048a"             
##  [61] "hzcy049a"              "hzcy050a"              "hzcy051a"             
##  [64] "hzcy052a"              "hzcy053a"              "hzcy054a"             
##  [67] "hzcy055a"              "hzcy056a"              "hzcy057a"             
##  [70] "hzcy058a"              "hzcy059a"              "hzcy060a"             
##  [73] "hzcy061a"              "hzcy062a"              "hzcy063a"             
##  [76] "hzcy064a"              "hzcy065a"              "hzcy066a"             
##  [79] "hzcy067a"              "hzcy068a"              "hzcy069a"             
##  [82] "hzcy070a"              "hzcy071a"              "hzcy072a"             
##  [85] "hzcy073a"              "hzcy074a"              "hzcy075a"             
##  [88] "hzcy076a"              "hzcy077a"              "hzcy078a"             
##  [91] "hzcy079a"              "hzcy080a"              "hzcy081a"             
##  [94] "hzcy083a"              "hzcy084a"              "hzcy085a"             
##  [97] "hzcy086a"              "hzcy087a"              "hzcy088a"             
## [100] "hzcy089a"              "hzcy090a"              "hzcy091a"             
## [103] "hzcy092a"              "hzcy093a"              "hzcy095a"             
## [106] "hzcy096a"              "hzcy097a"              "hzcy098a"             
## [109] "hzcy099a"              "hzza001a"              "hzza002a"             
## [112] "hzza003a"              "hzzq009a"              "hzzq016b"             
## [115] "hzzq023a"              "hzzp201a"              "hzzp204a"             
## [118] "hzzp207a"              "hzzr001a"              "hzzr002a"             
## [121] "hzzr003a"              "hzzr004a"              "hzzr005a"             
## [124] "hzzr006a"              "hzzr007a"              "hzzr008a"             
## [127] "hzzr009a"              "hzzr010a"              "hzzr011a"             
## [130] "hzzr012a"              "hzzr013a"              "hzzr014a"             
## [133] "hzzr015a"              "hzzr016a"              "hzzr017a"             
## [136] "hzzr018a"              "hzzr019a"

We see here that most of the names are not very descriptive which is something that we might want to change.

4

Rename the variable hzcy053a to employment_march and hzcy071a to children using base R and then rename hzcy044a to trust_doctor and hzcy050a to trust_moh using a function from the tidyverse package dplyr.
The base R function for this is colnames(), and the dplyr function is rename().
# Base R
colnames(gesis_panel_corona)[colnames(gesis_panel_corona) == "hzcy053a"] <- "employment_march"
colnames(gesis_panel_corona)[colnames(gesis_panel_corona) == "hzcy071a"] <- "children"

# tidyverse (dplyr)
gesis_panel_corona <- gesis_panel_corona %>% 
  rename(trust_doctor = hzcy044a, # new_name = old_name
         trust_mho = hzcy50a)

For the remainder of this exercise, we will focus on functions from the tidyverse. Of course, if you want to, you can also use base R to solve the tasks, or, if you are extra ambitious, you can use both.

5

Select the following variables from the dataset and assign them to an object called demo: sex, age_cat, education_cat, marstat, household, children.
You need to use the select() function from dplyr.
demo <- gesis_panel_corona %>% 
  select(sex:education_cat,
         marstat,
         household,
         children)

6

Filter the demo dataset, so that it only contains married men.
You should probably consult the codebook for the dataset. If you do that, you will see that the code for men is 1 and the one for married is also 1.
married_men <- demo %>% 
  filter(sex == 1,
         marstat == 1)

7

Using a function from the naniar package, recode the values -99, -77, -88, -33, -22, and -11 as missing for all variables in the demo dataframe.
You need to create a vector with the missing values to replace values with NA in all variables.
library(naniar)

missings <- c(-99, -77, -88, -33, -22, -11)

demo <- demo %>% 
  replace_with_na_all(condition = ~.x %in% missings)

8

How many complete cases (without missing values for any of the variables) are there in the demo dataframe? Do not assign the result to an object.
You can use a function from the tidyr package to check this. Do not overwrite the demo object.
demo %>% 
  drop_na() %>% 
  nrow()

9

Recode the age_cat variable into an ordered factor with 5 levels called age_5_cat.
Again, consulting the codebook for the dataset helps you in picking the right values and labels for the new variable. The helper function between() will be helpful here.
demo <- demo %>% 
  mutate(age_5_cat = case_when(
    between(age_cat, 1, 2) ~ " <=25 years",
    between(age_cat, 3, 4) ~ "26 to 40 years",
    between(age_cat, 5, 6) ~ "41 to 50 years",
    between(age_cat, 7, 8) ~ "51 to 65 years",
    age_cat > 8 ~ ">= 65 years",
    .ordered = TRUE))